skip to main content


Search for: All records

Creators/Authors contains: "Jain, Sarthak"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 25, 2024
  2. UAVs (unmanned aerial vehicles) or drones are promising instruments for video-based surveillance. Various applications of aerial surveillance use object detection programs to detect target objects. In such applications, three parameters influence a drone deployment strategy: the area covered by the drone, the latency of target (object) detection, and the quality of the detection output by the object detector. Previous works have focused on improving Pareto optimality along the area-latency frontier or the area-quality frontier, but not on the combined area-latency-quality frontier, because of which these solutions are sub-optimal for drone-based surveillance. We explore a three way tradeoff between area, latency, and quality in the context of autonomous aerial surveillance of targets in an area using drones with cameras and an object detection program. We propose Vega, a drone deployment framework that captures these tradeoffs to deploy drones efficiently. We make three contributions with Vega. First, we characterize the ability of the state-of-the-art mobile object detector, EfficientDet [CPVR '20], to detect objects from varying drone altitudes using confidence and IoU curves vs. drone altitude. Second, based on these characteristics of the detector, we propose a set of two algorithmic primitives for drone-based maneuvers, namely DroneZoom and DroneCycle. Using these two primitives, we obtain a more optimal Pareto frontier between our three target parameters - coverage area, detection latency, and detection quality for a single drone system. Third, we scale out our findings to a swarm deployment using higher-order Voronoi tessellations, where we control the swarm's spatial density using the Voronoi order to further lower the detection latency while maintaining detection quality. 
    more » « less
    Free, publicly-accessible full text available June 19, 2024
  3. Training the deep neural networks that dominate NLP requires large datasets. These are often collected automatically or via crowdsourcing, and may exhibit systematic biases or annotation artifacts. By the latter we mean spurious correlations between inputs and outputs that do not represent a generally held causal relationship between features and classes; models that exploit such correlations may appear to perform a given task well, but fail on out of sample data. In this paper, we evaluate use of different attribution methods for aiding identification of training data artifacts. We propose new hybrid approaches that combine saliency maps (which highlight important input features) with instance attribution methods (which retrieve training samples influential to a given prediction). We show that this proposed training-feature attribution can be used to efficiently uncover artifacts in training data when a challenging validation set is available. We also carry out a small user study to evaluate whether these methods are useful to NLP researchers in practice, with promising results. We make code for all methods and experiments in this paper available. 
    more » « less
  4. null (Ed.)
  5. null (Ed.)
  6. Efficient and adaptive computer vision systems have been proposed to make computer vision tasks, such as image classification and object detection, optimized for embedded or mobile devices. These solutions, quite recent in their origin, focus on optimizing the model (a deep neural network, DNN) or the system by designing an adaptive system with approximation knobs. Despite several recent efforts, we show that existing solutions suffer from two major drawbacks. First , while mobile devices or systems-on-chips (SOCs) usually come with limited resources including battery power, most systems do not consider the energy consumption of the models during inference. Second , they do not consider the interplay between the three metrics of interest in their configurations, namely, latency, accuracy, and energy. In this work, we propose an efficient and adaptive video object detection system — Virtuoso , which is jointly optimized for accuracy, energy efficiency, and latency. Underlying Virtuoso is a multi-branch execution kernel that is capable of running at different operating points in the accuracy-energy-latency axes, and a lightweight runtime scheduler to select the best fit execution branch to satisfy the user requirement. We position this work as a first step in understanding the suitability of various object detection kernels on embedded boards in the accuracy-latency-energy axes, opening the door for further development in solutions customized to embedded systems and for benchmarking such solutions. Virtuoso is able to achieve up to 286 FPS on the NVIDIA Jetson AGX Xavier board, which is up to 45 times faster than the baseline EfficientDet D3 and 15 times faster than the baseline EfficientDet D0. In addition, we also observe up to 97.2% energy reduction using Virtuoso compared to the baseline YOLO (v3) — a widely used object detector designed for mobiles. To fairly compare with Virtuoso , we benchmark 15 state-of-the-art or widely used protocols, including Faster R-CNN (FRCNN) [NeurIPS’15], YOLO v3 [CVPR’16], SSD [ECCV’16], EfficientDet [CVPR’20], SELSA [ICCV’19], MEGA [CVPR’20], REPP [IROS’20], FastAdapt [EMDL’21], and our in-house adaptive variants of FRCNN+, YOLO+, SSD+, and EfficientDet+ (our variants have enhanced efficiency for mobiles). With this comprehensive benchmark, Virtuoso has shown superiority to all the above protocols, leading the accuracy frontier at every efficiency level on NVIDIA Jetson mobile GPUs. Specifically, Virtuoso has achieved an accuracy of 63.9%, which is more than 10% higher than some of the popular object detection models, FRCNN at 51.1%, and YOLO at 49.5%. 
    more » « less